dask persist - Python for climatology, oceanograpy and atmospheric science

dask persist

Speed up by changing where computation happens: persist() and task-graph fusion

When you reuse the same derived field many times (e.g., for multiple plots and statistics), it’s often more stable and faster to keep that intermediate result in distributed memory with persist() rather than forcing a full materialization with .compute() each time.

This is especially effective when the same intermediate (like EKE) feeds several downstream reductions, because it reduces repeated work and can improve scheduler efficiency.

code: python

from dask.distributed import Client

client = Client()

# Expensive intermediate field

eke = (0.5 * (ds"u"**2 + ds"v"**2)).chunk({"time": 24, "y": 512, "x": 512}).persist()

# Downstream computations become cheaper because they reuse persisted EKE

eke_mean = eke.mean("time")

eke_p95 = eke.quantile(0.95, dim="time")

Practical tip: persist() can consume substantial worker memory. In practice, it’s best to persist only the specific intermediates that are reused many times (and only at the stage where reuse actually starts), rather than persisting large, early-stage datasets indiscriminately.